[SPARK-56429][DOCS] Clarify differences between nullValue and emptyValue CSV options#55405
[SPARK-56429][DOCS] Clarify differences between nullValue and emptyValue CSV options#55405yadavay-amzn wants to merge 1 commit intoapache:masterfrom
Conversation
fcd8c53 to
1c8fcd5
Compare
…lue CSV options Update the CSV data source documentation to better explain how nullValue, emptyValue, and nanValue differ from each other and when each option applies.
1c8fcd5 to
11627ee
Compare
|
@yadavay-amzn Thanks for the patch! May you review my comment to see if it makes any sense to you? |
@waterlx replied to JIRA comment asking for feedback from the previous developers who worked on this. Looked to me like PR #22234 is unrelated to SPARK-56429, so we still need this PR to fix the issue. May be I'm missing something? |
@yadavay-amzn Sorry if I made you confused and I did not quite get you. I am not sure if my second comment made you confused, sorry for that if it did. Please allow me to explain: My point is may you consider adding (2) into your PR?. Making any sense to you? |
|
@mmolimar It is regarding #22234 about @MaxGekk @HyukjinKwon May you review this PR by @yadavay-amzn, since you also worked on or reviewed #22234. Thank you! |
What changes were proposed in this pull request?
Update the CSV data source documentation to clarify how
nullValue,emptyValue, andnanValuediffer from each other:nullValue: when this exact string is encountered in the CSV input, Spark treats the field as SQL NULL.emptyValue: when a quoted empty string ("") is encountered, Spark substitutes this value instead. Only applies to string type columns.nanValue: when this string is encountered, Spark treats it as NaN for float/double columns.Why are the changes needed?
The previous descriptions used the same pattern ("Sets the string representation of ...") for all three options, which is misleading because
nullValuematches input to produce null, whileemptyValuespecifies the output value to substitute. See SPARK-56429.Does this PR introduce any user-facing change?
No.
How was this patch tested?
Documentation-only change.
Was this patch authored or co-authored using generative AI tooling?
Yes.